Random Costs in Combinatorial Optimization 
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The random cost problem is the problem of finding the 
minimum in an exponentially long list of random numbers. 
By definition, this problem cannot be solved faster than by 
exhaustive search. It is shown that a classical NP-hard opti- 
mization problem, number partitioning, is essentially equiva- 
lent to the random cost problem. This explains the bad per- 
formance of heuristic approaches to the number partitioning 
problem and allows us to calculate the probability distribu- 
tions of the optimum and sub-optimum costs. 
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Recent years have witnessed an increasing interaction 
among the disciplines of discrete mathematics, computer 
science and statistical physics. Particularly the methods 
and concepts developed in spin glass theory have been 
applied successfully to problems from combinatorial op- 
timization An optimization problem is defined by a 
set X (the domain) of feasible solutions a £ X and a real 
valued function H on X. The minimization problem then 
is: Find that a € X which minimizes H(a). In combina- 
torial optimization X is always countable. For minimiza- 
tion problems, H is called the cost-function, physicists 
call it Hamiltonian or energy. 

Most of the problems in combinatorial opimization are 
NP-hard which means that no algorithm is known 
that solves the problem significantly faster than exhaus- 
tive search of the domain. Although it has not been 
proven, it is widely believed that for NP-hard problems 
faster algorithms do not exist . A problem for which 
this can be proven is the random cost problem: Here the 
cost function is a table of random numbers and the role 
of a is reduced to an index. It can be proven, that one 
has to look at every number in the table to find the mini- 
mum ]5| . Furthermore it is obvious that there is no better 
heuristic than repeated random lookup. In this sense the 
random cost problem is harder than many other NP-hard 
problems, for which much better heuristics do exist. 

In this contribution it is shown that the random cost 
problem is not an artificial toy problem but a valid de- 
scription of at least one classical problem from combi- 
natorial optimization: the number partitioning problem, 
NPP. 
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Number partitioning is one of Garey and Johnson's || 
six basic NP-complete problems that lie at the heart of 
the theory of NP-completeness. It is defined as follows: 
Given a set \a\, ei2, . . . , a^v} of positive numbers, find a 
partition, i.e. two disjoint subsets A\ and Ai such that 
the residue 



E : 
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(1) 



is minimized. In the balanced number partioning prob- 
lem, the optimization is restricted to partitions with 
I Ai| = I-A2I = N/2 (N even). A partition can be en- 



coded by Ising spins Sj = ±1: Sj = 1 if aj G A\, 
otherwise. The cost function then reads 
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and the minimum partition is equivalent to the ground 
state of the Hamiltonian 
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In statistical mechanics, this is an infinite range Ising 
spin glass with Mattis-like, antiferromagnetic couplings 
Jij = -am |||-[|. 

The computational complexity of the NPP depends 
on the number of bits needed to encode the numbers 
aj. Numerical simulations show, that for independent, 
identically distributed (i.i.d.) random 6-bit numbers aj, 
the solution time grows exponentially with N for N < b 
(roughly) and polynomially for iV > b [^o|-^2| . The tran- 
sition from the "hard" to the computational "easy" phase 
has some features of a phase transition in physical sys- 
tems. Phase transitions of this kind have been observed 
in numerous NP-complete problems |T^[l5[|, and can of- 
ten be analyzed quantitatively in the framework of sta- 
tistical mechanics. Compared to other problems, this 
analysis is surprisingly simple for the number partition- 
ing problem ||. 

Here we concentrate on the computationally hard 
regime N -C b, i.e. we consider the aj to be real num- 
bers of "infinite" precision. For this case, Karmarkar et 
al. Ji6| have proven that the median value of the optimum 
residue E\ is 0{s/~N ■ 2~ N ) for the unconstrained and 
0(N ■ 2~ N ) for the balanced partitioning problem. Their 
proof yields no results on the distribution of E%, however, 
or at least on its average value. Numerical simulations 
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J7J indicate, that the relative width of the distribution of 
Ei, defined as 



(E 2 ) - {E x y 



(Ei) 



(4) 



where (•) denotes the average over the a/s, tends to 
a finite value in the large N limit, more precisely: 
liniAr^oo r = 1, for both the unconstrained and the bal- 
anced partitioning problem. This means, that the ground 
state energy is a non self averaging quantity. 

Another surprising feature of the NPP is the bad per- 
formance of heuristic algorithms . The best known 
heuristic, the differencing method jlljjl^], yields parti- 
tions with expected residue (D(N~ a log N ), a > for cij 
distributed uniformly between and 1. This is still bad 
compared to 0(y/N ■ 2~ N ) for the true optimum. 

In this contribution we show that all these features can 
be understood qualitatively and quantititavely by the ob- 
servation, that number partitioning is essentially equiv- 
alent to a random cost problem. Our line of reasoning 
closely follows Derrida who introduces the ran- 

dom energy model (REM) in spin glass theory. The ran- 
dom cost problem is the optimization counterpart of the 
REM, with some modifications, as we will see below. 

In the balanced NPP, the energies are distributed ac- 
cording to 



P(E) = 



N 
N/2 




(5) 



where the primed sum runs over all spin configurations 
with ^ Sj = 0. The symmetry of the problem and our 
assumption of i.i.d. random variables a,j allow us to write 



P(E) 
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where denotes the step- function, Q(x) = 1 for x > 
and Q(x) = for x < 0. The symmetrization a sym of 
a, i.e. the random variable distributed as the result of 
subtracting two independent variables a\ — a 2 , has mean 
and variance 2a 2 with a 2 = ^a 2 ) — (a) 2 . If gk denotes 
the density of the fc-th partial sum of a sym we can write 
P(E) — 2g N / 2 {E)@{E), which according to the central 
limit theorem becomes 



P(E) 



v / 2tt<j 2 N 



exp 



for large values of N. The energies in the unconstrained 
NPP follow the same distribution but with a 2 replaced 
by (a 2 ). 

The probability density of finding energies E\ and E 2 

is 



P(Ei,E 2 ) = 4-Q(E)Q(E' 
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for the balanced NPP. Again we use the gauge invariance 
to state that each term in the above sum depends on {sj} 
and {s'j} only through the overlap 
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Then 
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where the primed sum denotes summation over Q = 
-N, ~N + 4, . . . , N - 4, N and 

P Q (E 1 ,E 2 ) = - ■ 9(n+q)/a{ 2 > ■9{n-q)/a{ ^ . 

(11) 

(see above for a definition of g). For large values of N, 
the central limit theorem tells us that 



Pq(E 1i E 2 
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with q = Q/N. In the same limit we may apply Stirling's 
formula to the binomial coefficients and replace the sum 
over Q by an integral over q: 



P(Ei,E 2) = 2 ^&^-^ N 
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with 



f(q) = ~(l + q) ln(l + q) + ~(1 - q) ln(l - q). (14) 

The integral can be evaluated using the saddle point ap- 
proximation. For E\ and E 2 both 0(y/~N), the saddle- 
point is at q = 0: 



P(E 1 ,E 2 



2Q (E 1 )G(E 2 ) / E 2 + E 2 ^ 

,o 2 N GXP ( ~ ~2^t) (15) 



i.e. P(E X ,E 2 ) = P(Ei) ■ P{E 2 ). A similar calculation 
shows that P{E\,E 2 ) factorizes for the unconstrained 
NPP, too. Note that for E = 0{N) the saddle point is no 
longer at q — and P{E\,E 2 ) does not factorize. This 
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is plausible, since energies O(N) can only be achieved 
by putting a number O(N) of the lowest values dj into 
one partition and a number O(N) of the largest values in 
the complement. The corresponding spin sequences then 
have an overlap O(N). 

The two basic properties that lead to the factoriza- 
tion of P(P, E') are the gauge invariance, i.e. the fact 
that (S(E — a>jSj)5(E' — a j s 'j)) omv depends on 
the overlap q of the sequences, and the entropic domi- 
nance of the q — contributions. Both properties persist 
if one considers the probability distributions of three or 
more levels, so we claim that P(Ei, E%, . . . , Ek) factorizes 
as well. Instead of providing a formal derivation, we con- 
sider this as an assumption and discuss its consequences. 

Motivated by the factorization of the distribution of 
energies, we may now specify our random cost prob- 
lem: Given are M — 0(2 ) random numbers Pi, in- 
dependently drawn from the density P(P), Eq. (Q). Find 
the minimum of these numbers. The correspondence to 
the NPP requires M — | (jvy 2 ) ^ or tne balanced and 
M = 2 Ar_1 for the unconstrained case. 
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FIG. 1. Distribution of scaled ground state energies for 
the balanced number partioning problem. The solid line is 
given by Eq. (|l8|), the symbols are averages over 10 4 random 
samples. 

Let Ek denote the fc-th lowest energy of an instance of 
our random cost problem. The independence of the P, 
enables us to write 

p 1 (E 1 ) = M ■ P(E 1 ) ■ (1- J P(E')dE'j (16) 

for the probability density p\ of the minimum energy. E\ 
must be small to get a finite r.h.s. in the large M limit. 
Hence we may write 



piCEi)« Af-P(0)- (l-PiP(O) 



M-l 



M ■ P(0) ■ e 



- MP (0) Bi 



This means that the probability density of the scaled 
minimal energy, 



e x = M • P(0) • Pi 



(17) 



for large M converges to a simple exponential distribu- 
tion, 



pi(e) = e- £ -e(e). 



(18) 



Note that a rigorous derivation from Eq. ( |l6| ) to Eq. ( |lS| ) 
can be found in any textbook on extreme order statistics 
p2| . Along similar lines one can show that the density 
p k of the fc-th lowest scaled energy is 



_fc-i 



Pk{e) 



(fc-1)! 
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fc = 2,3, . 



(19) 



Let us compare Eqs. (|T|) and (|T^) with other ana- 
lytical and numerical results. From the moments of the 
exponential distribution Eq. ([Lsl), (e") = nl, we get 



\El) 



(Pi 



(Pi 



(20) 



in perfect agreement with the numerical findings of Fer- 
reira and Fontanari |Q] . The average ground state energy 
is (E^ = 1/(M • P(0)), which gives 



(E^ = 7T • cr • N ■ 2~ N 
for the balanced and 

(Pi) = y/2ir (a 2 ) -y/N-2- 



-N 



(21) 



(22) 



for the unconstrained NPP. Again this is in very good 
agreement with numerical |j and analytical |)| results. 

To check that the random cost ansatz does not only 
give the correct first and second moment of Pi, we cal- 
culated the distribution of E\ and higher energies numer- 
ically. Figs. (|l|) and (||) display the results for the bal- 
anced NPP. Equivalent plots for the unconstrained NPP 
look similar. The agreement between the numerical data 
and Eqs. ( |l8|) and ([l9]) is convincing. The algorithm used 
to solve larger instances of the balanced NPP is described 
in|| ; 

All in all, the random cost problem seems to be a valid 
alternative formulation of the number partitioning prob- 
lem. This correspondence not only provides new analytic 
results on the NPP but also has some consequences for 
the dynamics of algorithms: Any heuristic that exploits 
a fraction of the domain, generating and evaluating a se- 
ries of feasible configurations, cannot be better than ran- 
dom search. The best solution found by random search 
is distributed according to Eq. (|l6|), i.e. the average 
heuristic solution should approach the true optimum no 
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comment by Jean-Philippe Bouchaud and Marc Mezard. 
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FIG. 2. Distribution of scaled fc-th lowest energy for the 
balanced number partioning problem. The solid lines are 
given by Eq. (Jig)) , the symbols are averages over 10 5 random 
samples of size TV = 24. 

faster than 0(1/M), M being the number of configura- 
tions generated. Note that the best known heuristic, the 
complete Karmarkar-Karp differencing Jl2],^3| converges 
slower, namely like 0(\/M a ) with a < 1 to the true 
optimum. It would be interesting to check whether sim- 
ple random search really converges faster. Beyond num- 
ber partitioning, the dynamics of heuristic algorithms for 
other combinatorial optimization problems may be con- 
sidered as a signature of a corresponding random cost 
problem, possibly with a differing single cost distribu- 
tion. 

With its focus on costs rather than configurations, our 
random cost problem is very similar to Derrida's random 
energy model from statistical mechanics |2^j2l[] , with an 
important difference: the single energy distribution in 
Derrida's model is Gaussian, i.e. in principle it allows 
arbitrary low energies. The random cost formulation of 
the NPP on the other hand leads to a strict lower bound 
for the energies. As a consequence, both models be- 
long to different universality classes with respect to their 
asymptotic order statistics [|22| . The replica-method from 
statistical mechanics solves the Gaussian random energy 
model but fails for bounded distributions like the one en- 
countered here pi| . It is an open problem how to modify 
the replica method in order to reproduce the statistical 
mechanics of the number partitioning problem [^) . 
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